Scaling Beyond One Rack and Sizing of Hadoop Platform
نویسندگان
چکیده
This paper focuses on two aspects of configuration choices of the Hadoop platform. Firstly we are looking to establish performance implications of expanding an existing Hadoop cluster beyond a single rack. In the second part of the testing we are focusing on performance differences when deploying clusters of different sizes. The study also examines constraints of the disk latency found on the test cluster during our experiments and discusses their impact on the overall performance. All testing approaches described in this work offer an insight into understanding of Hadoop environment for the companies looking to either expand their existing Big Data analytics platform or implement it for the first time.
منابع مشابه
Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملA Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection
Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....
متن کاملPerformance Impact of Data Locality in MapReduce on Hadoop
As the foundation for MapReduce processing, Hadoop is one of the fundamental technologies in big data analytics. Hadoop breaks up large data into data blocks, replicates them, and stores them in a distributed storage system. Data blocks can be placed in a machine where the data will be processed (data local), in a machine in the same rack (rack-local), or in a machine in a different rack (off-r...
متن کاملBuild a Big Data Hadoop Infrastructure using Hitachi Compute Rack 220S Servers and Cloudera Hadoop
متن کامل
OPTIMAL GROUND MOTION SCALING USING ENHANCED SWARM INTELLIGENCE FOR SIZING DESIGN OF STEEL FRAMES
Dynamic structural responses via time history analysis are highly dependent to characteristics of selected records as the seismic excitation. Ground motion scaling is a well-known solution to reduce such a dependency and increase reliability to the dynamic results. The present work, formulate a twofold problem for optimal spectral matching and performing consequent sizing optimization based on ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Scalable Computing: Practice and Experience
دوره 16 شماره
صفحات -
تاریخ انتشار 2015